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Abstract 

We describe a machine-learning system that uses linear vector-space based techniques for inference from obser- 
vations to extend previous work on model construction for particle physics [10, 9, 5]. The program searches for 
quantities conserved in all reactions from a given input set; given data based on frequent decays it rediscovers the 
family conservation laws: baryon#, electron*, muon# and tau#. We show that these families are uniquely determined 
by frequent decay data. 

1 Introduction: Automated Search for Conserved Quantities 

One of the goals of particle physics theory is to find symmetries in particle interactions. This challenge has led to 
the discovery of new conservation laws for particle interactions. An important class of laws are additive conservation 
laws or selection rules. These selection rules are based on quantum numbers, physical quantities assigned to each 
particle. Table 1 shows the values of important quantum numbers for a set of particles [12]. The Standard Model with 
massless neutrinos includes the conservation of these quantum numbers [11]. For brevity, in the following we refer to 
the quantities {charge, baryon, electron, muon, tau} denoted {C;BEMT} as the "standard model quantities". 

In this paper we present an algorithm for discovering conserved quantities from given particle reaction data pro- 
vided by the user. Our methods are based on new techniques for machine learning in linear spaces, drawing on 
several new theorems in linear algebra. The goal of our system is to facilitate data exploration and automated model 
construction [1, 6, 8]. 

We apply our system to investigate selection rules for data based on frequent decay modes. These reactions do 
not include neutrino oscillations or chiral anomalies. Historically, the standard model selection rules were discovered 
incrementally by adding new rules in response to more evidence [4], [7]. Our program starts fresh and looks for a 
set of selection rules that is optimal for the input data. With the aid of the program, we can systematically explore 
alternative rules and investigate which features of the current conservation laws are particular and which are invariant. 

We prove several mathematical theorems that apply to any class of reactions (e.g., strong interactions, weak inter- 
actions, all allowed interactions). In contrast, the findings of our data analysis hold only for reactions in our data set 
which is based on frequent decays. This paper describes the following results: 

1 . For any class of reactions, the number of independent quantities conserved in the reaction class is no greater 
than the number of particles with no decay mode in the class. 

2. For the frequent decay reactions considered in our dataset, the standard model quantities {C;BEMT} are com- 
plete in the sense that every other quantity conserved in these reactions is a linear combination of {C;BEMT}. 

3. The particle families corresponding to {BEMT} are uniquely determined by the frequent decay data in our data 
set. 

The next section presents our algorithm and findings, followed by a summary. We briefly discuss applying our data 
analysis system to reactions that involve neutrino oscillation and/or chiral anomalies. Section 4 describes our dataset 
and gives formal proofs of the new linear algebra theorems in this paper. 
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Table 1 : Some Common Particles and Quantum Number Assignments 
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Table 2: The Representation of Reactions and Quantum Numbers as n-vectors 



2 Algorithm and Results 

We represent reactions and quantum numbers as n-vectors [9] [2], with the known particles numbered as pi, . . . ,p n . 
Given a reaction r, we subtract the sum of occurrences of pi among the products from the sum among the reagents 
to obtain the net occurrence of particle pi in reaction r. For example, in the transition p + p — > p + p + ir°, the net 
occurrence of p is and that of n is —1. The n-vector r is then defined by setting r(i) = the net occurrence of i, 
where r(i) denotes the i-th component of the n-vector r. Since net occurrences are integers, we refer to n-vectors with 
integer entries as reaction vectors. 

A quantum number is represented by an n-vector q, where q(i) = the quantum number for particle pi. For 
example, if particle p\ is then charge(l) = —1. If q is a quantum number and r a reaction, then q is conserved in 
r iff q • r = 0. If we write E for the set of input vectors that represent experimentally established reactions, the space 
of quantities conserved in all reactions in E is the orthogonal complement E- 1 . We generally write E for a set of input 
reactions to be analyzed by our algorithm, and R for an arbitrary class of reactions for which we prove a mathematical 
theorem. Table 2 illustrates these concepts. 

A fundamental principle that has guided the search for conservation principles in particle physics was dubbed 
Gell-Mann's Totalitarian Principle: "Anything which is not prohibited is compulsory"[3]. Thus if a reaction is not 
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observed, some physical law must forbid it. The more linearly independent quantum numbers we introduce, the more 
unobserved processes our conservation principles rule out. Hence we seek a basis in the nullspace of the observed 
reactions. Our analysis considers a comprehensive set of n = 193 particles (see Section 4.1 for details). Reactions 
that are linear combination of previous observed reactions (when viewed as n-vectors) do not lead to new constraints 
on additive selection rules. So we seek a maximal set of linearly independent observed reactions, which the next 
proposition helps us find. 

Proposition 1. Let di, ... , dk be a set of vectors for decays of distinct particles. That is, di is a process of the form 
Pi — > Then conservation of energy and momentum imply that the set {di, . . . , dk} is linearly independent. 

To illustrate the proposition, we note that for our 193 particles, there are established decay modes for all but 11 
particles (photon, proton, electron, the three neutrinos, and the respective antiparticles). The proposition guarantees 
that we obtain at least k = 193 — 11 = 182 linearly independent reaction vectors from these decay modes. 

Since rfim(R) + dim^R.^-) = n, it follows that if k is the number of particles with decay modes in a class of 
reactions R, then dim(R) > k, so dim(H?-) < n — k, which is the number of particles without a decay mode in the 
reaction class R. Thus we have a surprising relationship: 

Corollary 2. For any class of reactions R, the number of independent quantum numbers conserved in the class is 
bounded by the number of particles without a decay mode in the class. 

Note that this result holds for partial symmetries too since it holds for any subclass of processes to which the 
symmetry applies. To illustrate, since there are 11 stable particles without any known decay modes, we know a 
priori that there can be at most 11 linearly independent quantities conserved in all processes. In fact, we find a 
stronger necessary relationship between the number of particles without a decay mode and the number of independent 
conserved quantities, if we take into account the matter/antimatter division of elementary particles. Say that a quantum 
number q is coupled with respect to matter/antimatter, or simply coupled, if q(i) = — q(j) when particle pi is the 
antiparticle of p r The quantum numbers {C;BEMT} are all coupled with respect to matter/antimatter (see Table 1). 
The following proposition assumes that particle p has a decay mode just in case its antiparticle p does. 

Proposition 3. Let s be the number of particle/antiparticle pairs (p, p) that have no decay mode in a class of reactions 
R. Then conservation of energy and momentum imply that there are at most s coupled independent quantum numbers 
conserved in all reactions in the class. 

There are 6 particle/antiparticle pairs without a decay mode: (p,p), (e~,e + ), (7,7), {v e ,V^), (^,77^), (i/ r ,z7 T ). 
Thus the proposition implies that there can be at most 6 coupled linearly independent quantum numbers conserved in 
all allowed reactions. 

Based on Proposition 1, we included in E one decay mode for each particle that has one listed in the particle data 
Review [12], and a number (more than 11) of other known reactions resulting in a total of 205 datapoints. On this 
data, our computation establishes the following. 

Finding 1. The quantum numbers {C;BEMT} form a basis for the nullspace of the reaction dataset E based 
on frequent decays. Therefore, any other quantity conserved in all of these reactions is a linear combination of 
{QBEMT}. 

There are many sets of conserved quantities predictively equivalent to the standard model quantities {C;BEMT}. 
By "predictive equivalence" we mean that any reaction r conserves all the quantities {C;BEMT} if and only if 
r conserves all alternative quantities. In vector terms, a set of alternative quantities {qi, ...,q5} is equivalent to 
{C;BEMT} if and only if both sets span the same linear space. Although many alternative theories forbid the same 
reactions, the basis {C;BEMT} has a special feature that singles it out. The key insight is that these quantities not only 
classify reactions into "forbidden" and "allowed", but also group particles into families. Say that particle pi carries 
a quantum number q if q(«) ^ 0. For example, the carriers of electron number are the electron, positron, electron 
neutrino, electron antineutrino (see Table 1). A set of quantum numbers {q 1; . . . , q m } forms a family set if no particle 
carries two quantities; formally, if qj(fc) =0 whenever qj(fc) ^ 0, for all i ^ j. The quantum numbers BEMT form 
a family set. 
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Theorem 4. Let {qi, q2, q3, q4} be any family set of quantum numbers such that {qi, q2, q3, q4, C} is predictively 
equivalent to the standard model quantities {C;BEMT}. Then for each qi,i = 1 . .4, there is a standard quantity from 
{BEMT} such that the carriers of 'qi are the carriers of the standard quantity. In other words, any such family set 
qii q2, q3, q4 determines the same particle families as {BEMT}. 

This result can be interpreted as showing that the particle families corresponding to the baryons and the three 
lepton generations are invariant with respect to different alternative assignments of quantum numbers: any assignment 
of quantum numbers that is (1) predictively equivalent to the quantities {C;BEMT}, and (2) based on a division of 
particles into any families must in fact be based on the baryon family and the three lepton generations. Section 4.2 
shows that the uniqueness of particle families is a general fact that holds not just for the families of the Standard 
Model. 

We applied Theorem 4 to computationally rediscover the {BEMT} quantum number assignments: a computational 
search for an extension Q of {C} to a basis for E 1 - that minimizes the sum of the absolute values of the quantum 
numbers yields {BEMT} as the solution (up to sign). Theorem 4 establishes a tight relationship between particle 
dynamics and particle taxonomy: a given particle taxonomy suggests an explanation of reaction data via family con- 
servation laws such as {BEMT}; conversely the reaction data can be used to find a unique taxonomy corresponding 
to a complete set of conserved quantities. We emphasize that the program rediscovers the baryon family and the three 
generations of leptons from reaction data alone, without any knowledge of particle families or particle properties at 
all; internally, the program represents a particle simply as a natural number. 

3 Summary and Further Applications 

We described a new algorithm for finding an optimal set of selection rules for given reaction data and several linear 
algebra theorems that provide analytic insight into properties of selection rules. We applied the algorithm to a set of 
observed transitions consisting mainly of frequent decay modes. Our computations in combination with the mathe- 
matical analysis yield the following results. (1) For any class of reactions, the number of irredundant selection rules 
is bounded above by the number of particles without a decay mode in the class (counting particle-antiparticle pairs 
just once). (2) The quantities {C;BEMT} (= charge, baryon number, electron number, muon number, tau number) 
are optimal for our data set based on frequent decays in that they explain the nonoccurrence of as many unobserved 
processes as possible. (3) Given the conservation of electric charge, any optimal set of selection rules for our data set 
that is based on dividing particles into families must correspond to the particle families defined by the baryon, electron, 
muon and tau quantum numbers. Thus these families are uniquely determined by the data. 

Since our data set includes the most probable decay mode for each particle that has one, it excludes low-frequency 
events such as neutrino oscillations and chiral anomalies. As our algorithms can be used to find symmetries in any 
class of interactions, there is no obstacle in principle to apply them to data sets that include these types of events. For 
example, we could extend our data set E to include rare decays such as [i — > e~ + v e + v^. On this input data we 
expect the algorithm to find the conservation of electric charge, baryon number, and lepton number. Additional input 
processes could include reactions that violate the conservation of baryon number; in that case our hypothesis is that 
the algorithm will indicate the conservation of electric charge and baryon-lepton number. 

4 Methods and Proofs 

4.1 Design of the Particle and Reaction Database 

The 193-particle database is a comprehensive catalog of the known particles; some particles were excluded by the 
following criteria. (1) The database contains only particles included in the summary table [12], which excludes some 
particles whose existence and properties needs further confirmation. (2) We omitted some resonances. (3) We did 
not include quarks. (4) We included separate entries for each particle and its antiparticle, for example the proton 
p and its antiparticle p are listed separately. The reason is to see if the program can rediscover from the reaction 
data which particle pairs behave like antiparticles of each other, which it does. The complete particle and reaction 
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databases are available in Excel format at http://www.cs.sfu.ca/ oschulte/particles/ , and a list of all included particles 
at http://www.cs.sfu.ca/ oschulte/particles/particle-list.txt . 

Proposition 1 is the rationale for basing our reaction database on decays. The proof is as follows. Without loss of 
generality, assume that particles are numbered by mass in descending order, so that mass(p,) > mass(pj) whenever 
i < j. It is well-known that conservation of energy and momentum implies that if = pi — > p° + P 1 + p 2 + ■ ■ ■+p m 
is a possible decay, then mass^) < mass(pi) for j — 0, ..,m. Let D be the matrix whose rows are the vectors 
di, . . . , dfe representing the decays of distinct particles. Fix i and consider j < i; then mass(pj) > mass(pj), and so 
particle pj does not occur in decay di\ hence Dij = 0. Since this holds for arbitrary i, j < i, it follows that D is upper 
triangular and so the set of its row vectors is linearly independent. 

We omit the proof of Proposition 3 which has the same basic idea. 

4.2 Determination of Particle Families by Reaction Data 

We first show that any two family bases have the same carriers. Recall that carriers(v) = {i\v(i) ^ 0}, and that B is 
a family basis if for vi, v 2 G B we have carriers (vi)ncarriers(v 2 ) = 0. Say that a basis B' is a multiple of a basis B 
if for every vector v' in B', there is a vector v in B and a scalar a such that v' = av. If v = aw' for a / 0, then v and 
v' have the same carriers, so the bases B and B' determine the same families if B 1 is a multiple of a family basis B. 

Proposition 5. Suppose that B, B' are family bases for a linear space V. Then B' is a multiple of B. 

Proof. Let v' be a vector in B'. Then we may write v' = Yn=i a * v «' where G B, with n = dim(B). Since B' is 
a basis, v' ^ and there exists a; ^ 0. Then 

carriers (v^ C carriers (v') 

because B is a family set. We may write 

n 

v t = ^b k v' k + bv', 

fe=i 

where v' k ^ v' is in B' . Since B' is a family set, the carriers of J2 k=1 ^fe v fe aie disjoint from those of v'. As the 
carriers of v' include those of Vj, it follows that = bv' where 6^0, and so v' = ^v^. Since this holds for any 
vector in B', it follows that B' is a multiple of B. □ 

The basis {C;BEMT} is not a family basis for E because the carriers of electric charge C occur among the 
carriers of all other conserved quantities. Electric charge has the special status of being logically independent of the 
other quantities {BEMT}; we define this notion as follows. A quantity q is logically independent of a set of quantities 
B if for all v in B we have that (1) some carrier of v is not a carrier of q, (2) some carrier of q is not a carrier of v, 
and (3) some particle carries both p and q. 

Theorem 6. Let B U q be a basis for V such that \B\ > 2, B is a family set and q is logically independent of B. Let 
B' be another family set such that B' U {q} is a basis for V. Then B' is a linear multiple of B. 

Proof. Let v q be the vector v with assigned to all carriers of q (i.e. v q (i) — if q(i) = 0, and v q (i) — v(i) 
otherwise). For a set of vectors U, let U q = {v q | v G U}. It is easy to verify that B q and B' q are bases for V q , so the 
previous Proposition implies that B' q is a multiple of B q . Thus every quantity w' { in B' is of the form 

v- = aiVi + biq 

for some vector G B; we argue by contradiction that bi = for all v[ G B' , which establishes the theorem. 

Case 1: there are two distinct Vj, Vj such that bi ^ and bj ^ 0. Since \B\ > 2, there is a different from 
Vj. As B is a family set and q is logically independent of B, this implies that there is a particle p carrying both 

and q but neither Vj nor Vj . So p carries v[ — a^v^ + 6^q and also Vj = ajvj + bjq, which contradicts the supposition 

that B' is a family set. 

Case 2: there is exactly one v[ such that bi ^ 0. Choose a vector Vj and a particle p such that p carries both 
and q, but not v^. Then p carries v^ = a^Vj + biq, and since = a,jVj, the particle p carries as well, which 
contradicts the supposition that B' is a family set. 

In either case we arrive at a contradiction, which shows that B' is a linear multiple of B. □ 
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Theorem 4 follows immediately by setting q = C and B = BEMT. 
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